-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run queries through ad-hoc SSH tunnels #4797
Conversation
redash/query_runner/__init__.py
Outdated
@property | ||
def host(self): | ||
if "host" in self.configuration: | ||
return self.configuration["host"] | ||
else: | ||
raise NotImplementedError() | ||
|
||
@host.setter | ||
def host(self, host): | ||
if "host" in self.configuration: | ||
self.configuration["host"] = host | ||
else: | ||
raise NotImplementedError() | ||
|
||
@property | ||
def port(self): | ||
if "port" in self.configuration: | ||
return self.configuration["port"] | ||
else: | ||
raise NotImplementedError() | ||
|
||
@port.setter | ||
def port(self, port): | ||
if "port" in self.configuration: | ||
self.configuration["port"] = port | ||
else: | ||
raise NotImplementedError() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment explaining the purpose of these properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 8da215b
…ng the /jobs endpoint
@arikfr forceful schema refreshes can't be handled by gunicorn, so I offloaded them to RQ and polled for their results using the |
redash/tasks/general.py
Outdated
return True | ||
|
||
|
||
@job("schemas", queue_class=Queue, at_front=True, timeout=30, ttl=90) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately for some data sources timeout needs to be much longer. Let's start with 5 minutes.
"result": result, | ||
"query_result_id": query_result_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid accumulating tech debt, maybe we can return query_result_id
only when it's a execute_query
task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(only if it's simple check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to avoid accumulating tech debt, we should probably always return result
and ditch query_result_id
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this will break scripts that use /jobs for polling and expect query_result_id
... :| We could really use API versioning at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, that's why I wen't for doubling result
and query_result_id
, as the cheapest thing that won't break and won't be too awkward :(
except NotSupported: | ||
response["error"] = { | ||
"code": 1, | ||
"message": "Data source type does not support retrieving schema", | ||
} | ||
except Exception: | ||
response["error"] = {"code": 2, "message": "Error retrieving schema."} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep reporting these error codes. For example, we might show different UI when it's not supported. And we don't want to throw random errors at the user but rather show "Error retrieving schema." (it might make sense to show some of the errors, but it's beyond the scope of this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so too, but then I realized we aren't really using the error codes defined in the service. The current implementation (returning []
on NotSupported
from the backend) mimics the same behavior we have today - silent failing on NotSupported
and a "Schema refresh failed." message when errors occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that we don't really use them anymore in code -- this is something that happened during the conversion to React, but we should someday. It's better to be explicit about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arikfr @rauchy I have a few detailed questions about the job based schema fetching as I think this adds some risks regarding performance and user experience.
Apologies for not adding those comments before merging, I thought doing here than in a new ticket gives more context, but please let me know if you want me to open one anyway.
return True | ||
|
||
|
||
@job("schemas", queue_class=Queue, at_front=True, timeout=300, ttl=90) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arikfr @rauchy I think this job adds a big risk of filling up Redis when a lot of people try to load a query or refresh the schema since it stores the job result for the default 500 seconds and data source schema can be quite big depending on the number of tables.
Since the result is effectively used only once when loaded I would suggest a small amount of 30 seconds and make it a configurable number. WDYT?
function getSchema(dataSource, refresh = undefined) { | ||
if (!dataSource) { | ||
return Promise.resolve([]); | ||
} | ||
|
||
const fetchSchemaFromJob = (data) => { | ||
return sleep(1000).then(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds a noticeable delay to loading the schema browser without indication that it's loading the schema, could we add a "Loading... please wait" screen or something similar? Or reduce the sleep interval to something that is closer to an unnoticeable perception level?
Also, in case of a worker failure that prevents the job from ever returning a good result (e.g. Redis key eviction), could this add an upper limit on the number of tries to load the data and then show a button to fetch the schema anew with a new job?
Oh one other comment, Marina and me spent a lot of time on a custom schema drawer feature in our fork (that was also merged here a while ago but was backed out after feedback from Arik): mozilla@109eac8 Since this PR adds a background job to load the schema, I want to point out that our schema drawer feature has modified the
The results are lower traffic to cost intensive data sources (less $$$), immediate responses to the user and the ability to provide more details about the individual data source tables and columns. If you'd like to get a quick demo again, please let me know, we're obviously eager to have alignment with you. |
@arikfr @rauchy We're about to start our rebase downstream to pull in these changes and I was wondering if you had seen my comments above about the anti-patterns in this feature around loading schema with a noticeable 1 second delay and a risk of raise conditions if a RQ worker doesn't process the get_schema job? |
We need to do a follow up on this:
Adding a lock could be nice, but I'm not sure if it's needed once we move the caching functionality to the API layer. Basically at this point it returns being the same behavior as before (in terms of number of restarts). |
What type of PR is this? (check all applicable)
Description
Redash requires network access to the data sources it needs to query, but sometime these data sources aren't publicly available. A way around this is using SSH tunnels, which can be setup manually outside Redash, but can also be setup automatically before a connection is made, and closed automatically after they are no longer required.
This PR sets up ad-hoc SSH tunnels for data sources if they have a
ssh_tunnel
dict in their options, which must include anssh_host
andssh_username
of the bastion server to connect through.In order to enable SSH tunnels,
dynamic_settings.ssh_tunnel_auth()
must be implemented to allow the Redash instance to authenticate with the bastion server(s).Related Issues
#2013